Setup and Installation

  • Welcome to the MiamiR software package introductory tutorial, for all yourGWAS and wider genetic data visualisation needs!

  • Firstly, the recommended installation for this package is as follows:

    • Standard Method: remotes::install_github(“CRON-Project/MiamiR”)

    • Alternative Method: BiocManager::install(“CRON-Project/MiamiR”)

  • If you have any issues please raise issues through GitHub or contact me at: with subject heading indicating issue.

  • Once installed, load in the MiamiR package (lib.loc may need to be specified), This package is complete with extensive documentation and functionality.

  • A web app version will also be available soon (TBC).

library(MiamiR)

Single_Plot()

Base Example

  • Here’s how to create a Manhattan plot of the included exemplar ‘Intelligence_Sum_Stats’ summary statistics data set using the Single_Plot() function included in the package.

  • This function will return a plot object which we’ll manually save to a .jpg file in the vignettes/Plots directory of the package for future use and user access.

  • Initially, all automated, default settings will be used.

  • A progress bar will display as the function runs. This UI is the default for all MiamiR functions.

  • More detailed outputs can be viewed via setting the Verbose argument to TRUE for any MiamiR function.

plots_dir <- here("vignettes", "Plots")

Manhattan_Plot <- Single_Plot(Data = Intelligence_Sum_Stats)

setwd(plots_dir)
ggplot2::ggsave("Intelligence_Plot.jpg", plot = Manhattan_Plot, width = 30,
                 height = 15, units = "in", dpi = 100)

Let’s inspect our saved Manhattan Plot.

knitr::include_graphics(paste0(plots_dir,"/Intelligence_Plot.jpg"))

Customised Example

  • The Single_Plot() function also allows for a range of customisation. Here is a small example of such functionality.

  • It is also worth noting that running ?Single_Plot() or ? for any function or component in MiamiR will provide documentation on all functionality available and any relevant information on in-house data sets used.

Manhattan_Plot <- Single_Plot(Data = Intelligence_Sum_Stats, X_Axis_Title = "Region", 
                              Title = "Custom Plot", Label_Height = 15, 
                              Anchor_Label = "centre", Label_Angle = 90, 
                              Chromosome_Colours = c("red", "orange"), 
                              Colour_Of_Index = "blue", Diamond_Index = T, 
                              Diamond_Index_Size = 10)

setwd(plots_dir)
ggplot2::ggsave("Intelligence_Plot_Custom.jpg", plot = Manhattan_Plot, width = 30,
                 height = 15, units = "in", dpi = 100)

Let’s inspect our customised Manhattan Plot.

knitr::include_graphics(paste0(plots_dir,"/Intelligence_Plot_Custom.jpg"))

Varied Inputs Example

  • Files can also be fed in for all MiamiR functions using variables containing a file path or a raw file path itself, in addition to multiple files or current environment objects, whilst allowing for varied customisation.

  • Here, raw file paths for the above summary statistics are used alongside another included data set called ‘Household_Income_Sum_Stats’, with alternative customisation demonstrated as well.

  • Automated and unspecified defaults are also subtly adjust for different inputs across all functions.

data_dir <- here::here("inst", "extdata", "Example_Data_Raw")
setwd(data_dir)
Sum_Stats <- c(paste0(data_dir, "/Intelligence_Sum_Stats.txt"), 
               paste0(data_dir, "/Household_Income_Sum_Stats.txt"))

Manhattan_Plot <- Single_Plot(Data = Sum_Stats, Label_Angle = 60, Label_Colour = "grey", 
                              Sig_Line_Type = "solid", Condense_Scale = F )

setwd(plots_dir)
output_files <- c()
for (plot_name in names(Manhattan_Plot)) {
  file_path <- paste0(plot_name, ".jpg")
  ggplot2::ggsave(
    filename = file_path,
    plot = Manhattan_Plot[[plot_name]],
    width = 30, height = 15, units = "in", dpi = 100
  )
  output_files <- c(output_files, file_path)
}

Let’s inspect our saved, customised Manhattan Plots.

imgs <- file.path(plots_dir, output_files)
carousel(imgs)

Regional_Plot()

Base Example

  • There is also a Regional_Plot() function built around the above functionality, which has unique features in addition to seamlessly inheriting the Single_Plot() arguments to produce annotated regional plots.

  • Here is how to run the function, which automatically ascertains regions of interest, with this example focusing on key regions in chromosomes 10 and 22 of the intelligence GWAS summary statistics.

  • We will also save all generated objects as correctly sized images with guide measures also returned by the function and supplied as part of the plot objects.

  • Although defaults are suitable it is very important to specify the correct genome build (in this case HG19) for accurate recombination and gene annotation tracks/overlays, in addition to queried LD, where needed.

  Regional_Plots <- Regional_Plot(Data = Intelligence_Sum_Stats,  
                                  Genome_Build = "grch37", 
                                  Chromosomes = c(10,22))
  
  for (plot_name in names(Regional_Plots)) {
  plot <- Regional_Plots[[plot_name]]
  file_safe <- gsub("[^A-Za-z0-9]", "_", plot_name)
  filename <- paste0(file_safe, ".jpg")
  height_to_use <- attr(plot, "dynamic_height")
  if (is.null(height_to_use)) height_to_use <- 30

  setwd(plots_dir)
  ggplot2::ggsave(
    filename = filename,
    plot = plot,
    width = 30,
    height = height_to_use,  
    units = "in",
    dpi = 100,
    limitsize = FALSE
  )
}

Let’s inspect our Regional Plots for the aforementioned chromosomes.

img_dir <- plots_dir

img_paths <- list.files(
  path = img_dir,
  pattern = "Regional_Plot\\.jpg$",
  recursive = FALSE,
  full.names = TRUE
)

img_paths <- img_paths[order(file.info(img_paths)$mtime)]
carousel(img_paths)

Customised Example

  • One key customisation of the Regional_Plot() function is the ability to query for LD values around the lead SNP (with settings available for build, population etc.) and colour points in the top panel accordingly.

  • For simplicity and speed let’s focus on the single chromosome 22 peak.

  • Please bear in mind that querying an API is slow and may not always contain data pertaining to your particular set of SNPs or may be unavailable within certain system paramaters.

  • It is often better to produce your own LD matrix using tools such as PLINK1.9/2, the results of which can also be fed in to this package (FUTURE/TBC).

  Regional_Plots <- Regional_Plot(Data = Sum_Stats[1], Auto_LD = TRUE, Chromosomes = c(22), 
                                  Genome_Build = "grch37", Gene_Biotype = c("coding", "snorna"),  
                                  Gene_Biotype_Colour = c("red", "green" ),
                                  Sense_Arrow_Colour = "darkorange", 
                                  Gene_Structure_Size_Exons = 20, 
                                  Gene_Legend_Location = "Top Left",
                                  Gene_Legend_Title = "Categories", 
                                  Y_Axis_Title = "LOG10Pvalue", Diamond_Index_Size = 20)

  for (plot_name in names(Regional_Plots)) {
  plot <- Regional_Plots[[plot_name]]
  file_safe <- gsub("[^A-Za-z0-9]", "_", plot_name)
  filename <- paste0(file_safe, "_LD", ".jpg")
  height_to_use <- attr(plot, "dynamic_height")
  if (is.null(height_to_use)) height_to_use <- 30

  setwd(plots_dir)
  ggplot2::ggsave(
    filename = filename,
    plot = plot,
    width = 30,
    height = height_to_use,   
    units = "in",
    dpi = 100,
    limitsize = FALSE
  )
  
}

Let’s have a look at our heavily customised and LD annotated Regional Plot.

   img_dir <- plots_dir
  
   img_paths <- list.files(
    path = img_dir,
    pattern = "Regional_Plot_LD\\.jpg$",
    recursive = FALSE,
    full.names = TRUE
   )
  
  img_paths <- img_paths[order(file.info(img_paths)$mtime)]
  
  knitr::include_graphics(img_paths)

Complex Example

  • Let’s now pass both sets of summary statistics we used in the earlier example to show how regional plots can be constructed on distinct data sets simultaneously.

  • Both Intelligence_Sum_Stats and Household_Income_Sum_Stats seem to have a single significant association on chromosome 9.

  • Let’s see if they are in similar genomic locations…

  • For single vector arguments like Point_Colour, a list can be allocated in the order of customisation of data required.

  • Point_Colour = c(“blue”, “red”) will allocate blue to the points in Intelligence_Sum_Stats and red to the points in Household_Income_Sum_Stats. Any singularly specified arguments will apply globally.

  • Arguments which require a nested list structure, like Gene_Biotype can have two separate listed arguments passed by allotting e.g. Gene_Biotype = list( c(), c(“coding”, “processed_pseudogene” )).

  • This will allocate NULL/default settings to Intelligence_Sum_Stats and stratify the Household_Income_Sum_Stats plot by only “coding” and “processed_pseudogene” regions.

  • Please note that HG19 is passed as the genome build for Intelligence_Sum_Stats but HG38 for Household_Income_Sum_Stats due to differences in the data sets’ references.

  Regional_Plots <- Regional_Plot(Data = c(Intelligence_Sum_Stats, Household_Income_Sum_Stats),  
                                Chromosomes = c(9),  Genome_Build =  c("grch37", "grch38"), 
                                Recombination_Line_Colour = c("darkgrey", "pink"),  
                                Point_Colour = c("blue", "red"), Y_Axis_Title =  c("SigVal"), 
                                Gene_Biotype = list( c(), c("coding", "processed_pseudogene" )), 
                                Gene_Biotype_Colour = list( c(), c("blue", "green")) )
  
  for (plot_name in names(Regional_Plots)) {
  plot <- Regional_Plots[[plot_name]]
  file_safe <- gsub("[^A-Za-z0-9]", "_", plot_name)
  filename <- paste0(file_safe, "_LmultiD", ".jpg")
  height_to_use <- attr(plot, "dynamic_height")
  if (is.null(height_to_use)) height_to_use <- 30

  setwd(plots_dir)
  ggplot2::ggsave(
    filename = filename,
    plot = plot,
    width = 30,
    height = height_to_use,  
    units = "in",
    dpi = 100,
    limitsize = FALSE
  )
}

Let’s have a look at both chromosome 9 Regional Plots for our exemplar data sets.

img_dir <- plots_dir

img_paths <- list.files(
  path = img_dir,
  pattern = "Regional_Plot_LmultiD\\.jpg$", 
  recursive = FALSE,
  full.names = TRUE
)

img_paths <- img_paths[order(file.info(img_paths)$mtime)]
carousel(img_paths)

Complex Single_Plot()

  • Single_Plot() also accepts a similar logic when creating multiple plots with separate customisation simultaneously.

  • Let’s have a look at a small example.

Manhattan_Plot <- Single_Plot(Data = Sum_Stats, X_Axis_Title = "Region", Label_Height = 15, 
                              Anchor_Label = "centre", Label_Angle = c(90,45), 
                              Colour_Of_Index = "blue",  
                              Chromosome_Colours = list( c("red", "orange"),  
                                                         c("purple", "orange")  ))

setwd(plots_dir)
output_files <- c()
for (plot_name in names(Manhattan_Plot)) {
  file_path <- paste0(plot_name, "_Complex_Single", ".jpg")
  ggplot2::ggsave(
    filename = file_path,
    plot = Manhattan_Plot[[plot_name]],
    width = 30, height = 15, units = "in", dpi = 100
  )
  output_files <- c(output_files, file_path)
}

Let’s inspect our set of multiple, customised Manhattan Plots.

full_paths <- file.path(plots_dir, output_files)
carousel(full_paths)

Miami_Plot()

Base Example

  • We can also use the MiamiR package to create a Miami plot (mirrored, paired Manhattan Plot) using two sets of summary statistics using the Miami_Plot() function.

  • We will be using the same pair of data as above; ‘Intelligence_Sum_Stats’ and ‘Household_Income_Sum_Stats’, included in the MiamiR package.

  • This function automatically aligns and buffers coordinates across different data sets and genomic builds to create a neat visualisation. Let’s look at a default plot, specifying both data sets where needed.

Miami_Plot <- Miami_Plot(Top_Data = Intelligence_Sum_Stats, Bottom_Data = Household_Income_Sum_Stats)

setwd(plots_dir)
ggplot2::ggsave("Test_Miami.jpg", plot = Miami_Plot, width = 30,
                height = 20, units = "in", dpi = 100)

Let’s inspect our Miami Plot.

knitr::include_graphics(paste0(plots_dir,"/Test_Miami.jpg"))

Customised Example

  • Miami_Plot(), similarly to Regional_Plot() is based on and inherits the functionality of Single_Plot() and it can be customised using applicable arguments from previous examples.

  • However, it also allows for segregated and overall inheritance.

  • If an inherited argument like Point_Size is modified, then it will apply to the top and bottom plots within the Miami Plot.

  • If the argument is prefixed with Top_ or Bottom_ then these customisations will be allocated accordingly. Let’s start with an example of overall customisation.

Miami_Plot <- Miami_Plot(Top_Data = Intelligence_Sum_Stats, Bottom_Data = Household_Income_Sum_Stats, 
                         Chromosome_Colours = c("limegreen", "darkorange"), 
                         Chromosome_Label_Drops = c(15,19,22), Diamond_Index = TRUE, 
                         Chromosome_Label_Size = 12)

setwd(plots_dir)
ggplot2::ggsave("Custom_Test_Miami.jpg", plot = Miami_Plot, width = 30,
                height = 20, units = "in", dpi = 100)

Let’s inspect our customised Miami Plot.

knitr::include_graphics(paste0(plots_dir,"/Custom_Test_Miami.jpg"))

Complex Example

  • Let’s now use built in argument prefixing to modify the top and bottom plots independently within the Miami_Plot() function, whilst simultaneously applying global customisation.
Miami_Plot <- Miami_Plot(Top_Data = Sum_Stats[1], Bottom_Data = Sum_Stats[2], 
                         Top_Chromosome_Colours = c("purple", "grey"),
                         Top_Diamond_Index = TRUE, Bottom_Chromosome_Colours = c("pink", "red"), 
                         Chromosome_Label_Size = 30, Label_Size = 7 )

setwd(plots_dir)
ggplot2::ggsave("Split_Custom_Test_Miami.jpg", plot = Miami_Plot, width = 30,
                height = 20, units = "in", dpi = 100)  

Let’s inspect our split customisation within our new Miami Plot.

knitr::include_graphics(paste0(plots_dir,"/Split_Custom_Test_Miami.jpg"))

Forest_Plot()

Base Example

  • The MiamiR package can also be used to inspect key SNPs in single or multiple sets of GWAS summary statistics by using the Forest_Plot() function.

  • This function carefully aligns and manages SNP differences automatically and allows for much customisation.

  • Here’s how to create a Forest Plot centered around effect values of key SNPs on chromosome 4 from the previously described ‘Intelligence_Sum_Stats’ dataset.

  • The Forest_Plot() function will decipher everything from the test statistic type and suitable defaults to ascertainment of the lead SNP in each independent region, which can be defined similarly to the arguments used in Regional_Plot().

devtools::load_all("C:/Users/callumon/Miami_Package_R/MiamiR")

Forest_Plot <- Forest_Plot(Data = Intelligence_Sum_Stats, Chromosomes = 4)

setwd(plots_dir)
ggplot2::ggsave(
  filename = "Initial_Forest.jpg",
  plot = Forest_Plot,
  height = attr(Forest_Plot, "dynamic_height", exact = TRUE),
  width = 10,
  dpi = 100,
  units = "in"
)

Let’s inspect our Chromosome 4 lead SNPs Forest Plot.

knitr::include_graphics(paste0(plots_dir,"/Initial_Forest.jpg"))

Customised Example

  • We can also specify particular SNPs of interest to examine, in the desired plot order.

  • Let’s look at specific, select SNPs across both ‘Intelligence_Sum_Stats’ and ‘Household_Income_Sum_Stats’, whilst also demonstrating some of the customisation offered through MiamiR.

Forest_Plot <- Forest_Plot(Data = Sum_Stats, order = c("P_Label", "BETA_CI_Label"), Right_Size = 12,
                           Selected_SNPs = c("rs2014952", "rs759550", "rs72832626", "rs6682851"), 
                           Data_Set_Colours = c("blue", "green"), 
                           Shapes = c("triangle", "circle"), Names = c("Intel", "Income"), 
                           Double_Label = T, Left_Fill = "lightblue")
                              
setwd(plots_dir)
ggplot2::ggsave(
  filename = "Selected_Custom_Forest.jpg",
  plot = Forest_Plot,
  height = attr(Forest_Plot, "dynamic_height", exact = TRUE),
  width = 10,
  dpi = 100,
  units = "in"
)

Let’s inspect our selective Forest_Plot.

knitr::include_graphics(paste0(plots_dir,"/Selected_Custom_Forest.jpg"))

Model Example

  • The Forest_Plot() function also allows for the same figures to be produced but for covariate effects from the raw outputs of base R statistical models, particularly useful for GWAS sensitivity analyses and epidemiological profiling.

  • By specifying Model_Reference = T within Forest_Plot() and passing such model objects as the Data argument, a helper function named Model_Munge() will be called internally and automatically allocated to process such data.

  • Let’s create two models using a dataset included in the MiamiR package called ‘Fake_Demo_Data’ and pass them to the Forest_Plot() function whilst also demonstrating further customisation.

Model_One <- lm(BMI ~ Age + Sex + Ethnicity + Location, data = Fake_Demo_Data)
Model_Two <- lm(Dementia ~ Age + Sex + Ethnicity + Location, data = Fake_Demo_Data)

Forest_Plot <- Forest_Plot(Data = c(Model_One, Model_Two), order = c("P_Label", "CI_Label"),                                                          Right_Size = 12, Left_Size = 13, Model_Reference = T, Axis_Buffer = 0.05,
               Names = c("BMI", "Dementia"), Right_Fill = "lightgreen", Left_Fill = "turquoise" ,
               Strips = F, Styles = c("bold", "underline"), Null_Line_Type = "solid")
                              
setwd(plots_dir)
ggplot2::ggsave(
  filename = "Model_Custom_Forest.jpg",
  plot = Forest_Plot,
  height = attr(Forest_Plot, "dynamic_height", exact = TRUE),
  width = 10,
  dpi = 100,
  units = "in"
)

Let’s inspect our model based Forest_Plot.

knitr::include_graphics(paste0(plots_dir,"/Model_Custom_Forest.jpg"))

Model_Munge()

Base Example

  • Model_Munge() can also be used independently of Forest_Plot() on such Model Objects.

  • Let’s investigate its use with “Model_One”.

Model_Munged <- Model_Munge(Model_Object = "Model_One")

Let’s inspect our output from Model_Munge().

  head(Model_Munged, 5) |>
  kable(format = "html", digits = 3, caption = "", row.names = FALSE) |>
  kable_styling(
    full_width = TRUE,                      
    position = "center",
    bootstrap_options = c("striped", "hover")
  ) |>
  scroll_box(width = "100%")
Covariate BETA SE P group Reference
Age 0.002 0.007 0.772 Age None
Male -0.163 0.312 0.601 Sex Female
Black 0.129 0.498 0.796 Ethnicity Asian
Hispanic -0.232 0.495 0.639 Ethnicity Asian
Other 0.497 0.486 0.306 Ethnicity Asian

Annotate_Data()

Base Example

  • Another helper function which is integrated in to the Single_Plot() function under the Auto_Lab argument is Annotate_Data() which uses the coordinates of supplied data and enables annotation with RSIDs for the index SNPs on each chromosome.

  • These will be added to the data frame in a new column called Lab.

  • This is particularly useful when creating plots from coordinate-only files.

  • This queries an API in a similar way to the Auto_LD functionality in Reigonal_Plot() and as such it is important to bear in mind time constraints and requisite reference genome builds.

  • Here is how to use Labelled_Data() in isolation.

Annotated_Data <- Annotate_Data(Data = Intelligence_Sum_Stats, Genome_Build = "grch37")

Let’s inspect our annotated data frame.

  Annotated_Data[!is.na(Annotated_Data$Lab), c("CHR","POS","A1","A2","Lab")] |>
  head(5) |>
  kable(format = "html", digits = 3, caption = "") |>
  kable_styling(
    full_width = TRUE,                
    position = "center",
    bootstrap_options = c("striped", "hover") 
  ) |>
  scroll_box(width = "100%")
CHR POS A1 A2 Lab
15 51890618 t c rs118145271
6 98328774 t c rs9401295
3 50089690 t g rs2624823
18 50931300 a g rs12967910
2 162091836 t c rs3754970

Tube_Alloys()

Base Example

  • Last, but not least within the MiamiR package main functions is the Tube_Alloys() function.

  • This function combines Single_Plot(), Regional_Plot() and Forest_Plot() functionality in a blunt fashion to output overall plots, regions of interest and SNP and GWAS modelling information.

  • The user must simply pass GWAS Summary Statistic(s) via any of the aforementioned routes in addition to either combined Phenotype/Covariate files or separate Phenotype and Covariate files if sensitivity analysis is desired.

  • Here is how to run Tube_Alloys(), focusing on chromosome 9 on both ‘Intelligence_Sum_Stats’ and ‘Household_Income_Sum_Stats’.

  • In addition we’ll supply ‘Fake_PHENOS_Binary’ and ‘Fake_COVARS’ datasets included in MiamiR for modelling information.

  • The function will automatically apply defaults and auto-adjustments whilst applying focused arguments to relevant functions, where supplied data allows.

  • A special saving, helper function called save_all_plots is also included to facilitate easy output of results.

Plots <- Tube_Alloys(Data = c(Intelligence_Sum_Stats, Household_Income_Sum_Stats), 
                     Phenos = Fake_PHENOS_Binary, Covars = Fake_COVARS, Chromosomes = 1)

Tube_Alloys_Plots <- invisible(
  save_all_plots(
    Plots,
    base_dir = file.path(plots_dir, "Tube_Alloys_Plots"),
    dpi = 100
  )
)

Let’s inspect all Tube_Alloys() outputs.

all_files <- unlist(Tube_Alloys_Plots, use.names = FALSE)
carousel(all_files)

METASOFT_File_Gen()

Base Example

  • A utility function also included in the MiamiR package is the METASOFT_File_Gen() function which takes multiple datasets as inputs and munges them by default to the METASOFT meta-analysis format (wrapper FUTURE/TBC).

  • Other formats may be specified and allele matching is automatically taken care of, similarly to the Forest_Plot() function, with outputs maintaining all SNPs which match the backbone dataset.

  • Here is how to use this simple function using our usual data sets.

METASOFT <- METASOFT_File_Gen(Data = c("Intelligence_Sum_Stats", "Household_Income_Sum_Stats"))

Let’s preview the METASOFT_File_Gen() output.

  na.omit(METASOFT) |>
  head(5) |>
  kable(format = "html", digits = 3, caption = "") |>
  kable_styling(
    full_width = TRUE,                
    position = "center",
    bootstrap_options = c("striped", "hover")  
  ) |>
  scroll_box(width = "100%")
ID_DIR Intelligence_Sum_Stats_BETA Intelligence_Sum_Stats_SE Household_Income_Sum_Stats_BETA Household_Income_Sum_Stats_SE
chr9:90012825:G:A -0.003 0.005 0.001 0.002
chr18:34261100:G:A 0.003 0.018 0.000 0.002
chr3:95042569:G:T -0.008 0.003 0.001 0.002
chr10:126270188:G:C 0.008 0.007 0.003 0.002
chr6:54388932:G:A -0.016 0.034 0.001 0.005


Thank you for reading the tutorial vignette! More functions and functionality coming soon :)